确定与医学实体相对应的医学文本中的跨度是许多医疗保健NLP任务的核心步骤之一,例如ICD编码,医学发现提取,医学注释上下文化等等。现有的实体提取方法依赖于医疗实体的固定词汇和有限的词汇,并且难以提取以不相交跨度为代表的实体。在本文中,我们提出了一种新的基于变压器的架构,称为OSLAT,OPEL SET LABEL COATION TRUSSSIONER,它解决了先前方法的许多局限性。我们的方法使用标签 - 注意机制来隐式学习与感兴趣的实体相关的跨度。这些实体可以作为自由文本提供,包括在OSLAT培训期间看不到的实体,即使它们是不相交的,该模型也可以提取跨度。为了测试我们方法的普遍性,我们在两个不同的数据集上训练两个单独的模型,这些数据集具有非常低的实体重叠:(1)来自HNLP的公共排放笔记数据集,以及(2)更具挑战性的专有患者文本数据集“原因”相遇”(RFE)。我们发现,应用于数据集上的OSLAT模型在应用于RFE数据集以及HNLP数据集的一部分时,在数据集上训练了基于规则和模糊字符串匹配基线,其中实体由分离跨度表示。我们的代码可以在https://github.com/curai/curai-research/tree/main/oslat上找到。
translated by 谷歌翻译
我们介绍MedCod,一种医学准确,情感,多样化和可控的对话系统,具有独特的自然语言发生器模块的方法。 MedCod已经开发并专门为历史为任务进行了评估。它集成了传统模块化方法的优势,使(医学)域知识与现代深层学习技术结合起来,以产生灵活的人类自然语言表达。详细描述了Medcod的自然语言输出的两个关键方面。首先,生成的句子是情绪化的,同样地看着医生如何与患者沟通。其次,生成的句子结构和措辞是多样化的,同时保持与所需医学概念的医疗一致性(由Medcod的对话管理器模块提供)。实验结果表明了我们在创造人类医疗对话系统方面的有效性。相关代码在https://github.com/curai/curai-research/tree/main/medcod提供
translated by 谷歌翻译
在患者和医生之间的相互作用期间收集的捕获信息中,医疗谈话摘要是一体的。总结谈话用于促进医生之间的患者交出,以及将来提供护理的一部分。然而,摘要可能是生产和需要域专业知识的耗时。现代培训的培训NLP型号,如Pegasus已经成为人类总结的有能力的替代方案,在许多摘要基准上达到最先进的性能。然而,许多下游任务仍然需要至少适度尺寸的数据集来实现令人满意的性能。在这项工作中,我们(1)探讨了数据集大小对使用Pegasus的转移学习医疗会话摘要的影响,(2)在分类设置中取得成功之后,评估低数据制度的各种迭代标记策略。我们发现模型性能随着数据集大小的增加而饱和,并且各种主动学习策略评估所有显示与简单数据集大小的等效性能一致。我们还发现天真的迭代伪标签是比没有伪标签的典型或略差。我们的工作阐明了将低数据制度技术转化为医学谈话摘要的概率和挑战,帮助指导未来在这个空间中的工作。可用的相关代码在\ url {https://github.com/curai/curai-research/tree/main/medical-summarization-ml4h-2021}。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. We introduce a new class of GNNs, called Graph MLP-Mixer, that holds three key properties. First, they capture long-range dependency and mitigate the issue of over-squashing as demonstrated on the Long Range Graph Benchmark (LRGB) and the TreeNeighbourMatch datasets. Second, they offer better speed and memory efficiency with a complexity linear to the number of nodes and edges, surpassing the related Graph Transformer and expressive GNN models. Third, they show high expressivity in terms of graph isomorphism as they can distinguish at least 3-WL non-isomorphic graphs. We test our architecture on 4 simulated datasets and 7 real-world benchmarks, and show highly competitive results on all of them.
translated by 谷歌翻译
This paper is devoted to the numerical resolution of McKean-Vlasov control problems via the class of mean-field neural networks introduced in our companion paper [25] in order to learn the solution on the Wasserstein space. We propose several algorithms either based on dynamic programming with control learning by policy or value iteration, or backward SDE from stochastic maximum principle with global or local loss functions. Extensive numerical results on different examples are presented to illustrate the accuracy of each of our eight algorithms. We discuss and compare the pros and cons of all the tested methods.
translated by 谷歌翻译
This paper describes the system developed at the Universitat Polit\`ecnica de Catalunya for the Workshop on Machine Translation 2022 Sign Language Translation Task, in particular, for the sign-to-text direction. We use a Transformer model implemented with the Fairseq modeling toolkit. We have experimented with the vocabulary size, data augmentation techniques and pretraining the model with the PHOENIX-14T dataset. Our system obtains 0.50 BLEU score for the test set, improving the organizers' baseline by 0.38 BLEU. We remark the poor results for both the baseline and our system, and thus, the unreliability of our findings.
translated by 谷歌翻译
This paper studies the infinite-width limit of deep linear neural networks initialized with random parameters. We obtain that, when the number of neurons diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural network. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of neurons. We finally study the continuous-time limit obtained for infinitely wide linear neural networks and show that the linear predictors of the neural network converge at an exponential rate to the minimal $\ell_2$-norm minimizer of the risk.
translated by 谷歌翻译
Recent advances in deep learning models for sequence classification have greatly improved their classification accuracy, specially when large training sets are available. However, several works have suggested that under some settings the predictions made by these models are poorly calibrated. In this work we study binary sequence classification problems and we look at model calibration from a different perspective by asking the question: Are deep learning models capable of learning the underlying target class distribution? We focus on sparse sequence classification, that is problems in which the target class is rare and compare three deep learning sequence classification models. We develop an evaluation that measures how well a classifier is learning the target class distribution. In addition, our evaluation disentangles good performance achieved by mere compression of the training sequences versus performance achieved by proper model generalization. Our results suggest that in this binary setting the deep-learning models are indeed able to learn the underlying class distribution in a non-trivial manner, i.e. by proper generalization beyond data compression.
translated by 谷歌翻译
The combination of machine learning models with physical models is a recent research path to learn robust data representations. In this paper, we introduce p$^3$VAE, a generative model that integrates a perfect physical model which partially explains the true underlying factors of variation in the data. To fully leverage our hybrid design, we propose a semi-supervised optimization procedure and an inference scheme that comes along meaningful uncertainty estimates. We apply p$^3$VAE to the semantic segmentation of high-resolution hyperspectral remote sensing images. Our experiments on a simulated data set demonstrated the benefits of our hybrid model against conventional machine learning models in terms of extrapolation capabilities and interpretability. In particular, we show that p$^3$VAE naturally has high disentanglement capabilities. Our code and data have been made publicly available at https://github.com/Romain3Ch216/p3VAE.
translated by 谷歌翻译
在过去的几年中,神经网络(NN)从实验室环境中发展为许多现实世界中的最新问题。结果表明,NN模型(即它们的重量和偏见)在训练过程中的重量空间中的独特轨迹上演变。随后,这种神经网络模型(称为模型动物园)的人群将在体重空间中形成结构。我们认为,这些结构的几何形状,曲率和平滑度包含有关训练状态的信息,并且可以揭示单个模型的潜在特性。使用这种模型动物园,可以研究(i)模型分析的新方法,(ii)发现未知的学习动力学,(iii)学习此类人群的丰富表示形式,或(iv)利用模型动物园来用于NN权重和NN权重的生成模型偏见。不幸的是,缺乏标准化模型动物园和可用的基准可以显着增加摩擦,以进一步研究NNS人群。通过这项工作,我们发布了一个新颖的模型动物园数据集,其中包含系统生成和多样化的NN模型种群,以进行进一步研究。总共提出的模型动物园数据集基于八个图像数据集,由27个模型动物园组成,该模型动物园训练有不同的超参数组合,包括50'360唯一的NN型号以及其稀疏双胞胎,导致超过3'844'360收集的型号。 。此外,对于模型动物园数据,我们提供了对动物园的深入分析,并为多个下游任务提供了基准。该数据集可在www.modelzoos.cc上找到。
translated by 谷歌翻译